A Hybrid Topic Model for Multi-Document Summarization

نویسندگان

  • Jin An Xu
  • Jiangming Liu
  • Kenji Araki
چکیده

Topic features are useful in improving text summarization. However, independency among topics is a strong restriction on most topic models, and alleviating this restriction can deeply capture text structure. This paper proposes a hybrid topic model to generate multi-document summaries using a combination of the Hidden Topic Markov Model (HTMM), the surface texture model and the topic transition model. Based on the topic transition model, regular topic transition probability is used during generating summary. This approach eliminates the topic independence assumption in the Latent Dirichlet Allocation (LDA) model. Meanwhile, the results of experiments show the advantage of the combination of the three kinds of models. This paper includes alleviating topic independency, and integrating surface texture and shallow semantic in documents to improve summarization. In short, this paper attempts to realize an advanced summarization system. key words: multi-document summarization, hybrid topic model, hidden topic Markov model (HTMM), surface texture model, topic transition model

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Hierarchical Model for Multi-Document Summarization

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics u...

متن کامل

Multi-Document Summarization using Sentence-based Topic Models

Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, i.e. the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian s...

متن کامل

A Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization

Supervised learning methods and LDA based topic model have been successfully applied in the field of multi-document summarization. In this paper, we propose a novel supervised approach that can incorporate rich sentence features into Bayesian topic models in a principled way, thus taking advantages of both topic model and feature based supervised learning methods. Experimental results on DUC200...

متن کامل

Query-focused Multi-Document Summarization: Combining a Topic Model with Graph-based Semi-supervised Learning

Graph-based learning algorithms have been shown to be an effective approach for query-focused multi-document summarization (MDS). In this paper, we extend the standard graph ranking algorithm by proposing a two-layer (i.e. sentence layer and topic layer) graph-based semi-supervised learning approach based on topic modeling techniques. Experimental results on TAC datasets show that by considerin...

متن کامل

Global and Local Models for Multi-Document Summarization

In this paper we study the effectiveness of combining corpus-level (global) tag-topic models and target document set level local models for multi-document summarization. Recently tag-topic models that exploit both word level annotation (e.g. named entity type) and/or document level metadata (e.g. words related to topic categories) have been proposed to model documents tagged from two different ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 98-D  شماره 

صفحات  -

تاریخ انتشار 2015